I’ve usually heard this phenomenon called “incidental duplication,” and it’s something I find myself teaching junior engineers about quite often.
There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?
So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:
(1) you find that the code doesn’t look so repetitive anymore,
or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.
In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).
My CTO often asks me to implement a feature to do X and make it “generic enough to handle future use cases”. My answer is always the same - either give me at least three use cases now or I am going to make it work with this one use case. If we have another client that needs the feature in the future then we will revisit it.
Of course, there are some features that we know in advance based on the industry how we can genericize it.
From (I think) an old Joshua Bloch talk on API design, paraphrased:
* If you generalise based on one example, you will get a flexible API that can handle only that example.
* If you generalise based on two examples, you will get a flexible API that can switch between those two examples.
* If you generalise based on three examples, you have a chance of abstracting over the common essence.
Hmm. I wonder if he got that from Simon while he (JB) was at CMU. He (HS) once said to me, jokingly, I think, “One makes an observation, two makes a generalization, three makes a proof”.
The Rule of 3 is a great rule, except when it isn't.
I had a colleague some time ago who wrote a couple of data importers for FAA airspace boundaries. There were two data feeds we cared about, "class airspace" and "special use airspace". These airspace feeds have nearly identical formats, with altitudes, detailed boundary definitions, and such. There are a few minor differences between the two, for example different instructions for when a special use airspace may be active. But they are about 95% the same.
The developer wrote completely separate data definitions and code for the two. The data definitions mostly looked the same and used the same names for corresponding fields. And the code was also nearly the same between the two (in fact exactly the same for most of it).
It was clear that one importer was written first, and then the code and data structures were copied and pasted and updated in minor ways to create the second.
Because the data structures were unique for each (even if they looked very similar in the source code), this impacted all the downstream code that used this data. If you saw a field called FAA_AIRSPACE_MIN_ALTITUDE, you had be sure to not confuse the class airspace vs. special use airspace, because each of these had a field of the same name, the compiler wouldn't catch you if you used the wrong one, and you may have the wrong offset into the data structure.
I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."
Yep, the Rule of 3.
In this case there were only ever going to be two. And the two were nearly the same with only minor differences.
But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.
I still like the Rule of 3 as a general principle, and I have espoused it myself. But I think it is best applied in cases where the similarities are less clear, where it seems that there may be something that could be refactored and combined, but it's not yet clear what the similarities are.
I think it is a very bad rule in a case like this, where it should be obvious from the beginning that there are many more similarities than differences.
Every single rule or advice in programming is good until it isn't. OOP is good until it isn't, function programming is good until it isn't, premature optimization is the root of all evil until it is the root of all good.
For some reasons humans have this deep need to try and boil things down to bulleted lists which in the domain of programming are just incredibly not useful.
This is because programming is an art. It has fundamental components (objects, functions, templates, iterators, etc) like in visual design (point, line, shape, value, etc).
I think engineers should read the old and the new C++ books and master that language to know the evolution of all these paradigms and how to use them. There’s so much wisdom in the “Effective C++” series and Gang of Four and “C++ Templates: The complete guide“ to name a few.
Problem is in this “start up culture” to bang things out and get them working the art is left behind. Just like many other arts.
I was part of an uncredited team interviewed by Meyers for "Effective Modern C++". Always thought ir was somewhat ironic. At the firm (declined acknowledgments because they shun media attention) we werent even using a C++11 compiler on either Linux or Windows at the time. Yet, at least two of the patterns, I recognize as being at least a partial contributor to.
Myself, I lost track of C++ standards changes with C++17, and Ive not been using C++ for the last several years.
I still love the power and speed, but right now I'm dping more ETL work, and Python is a better and more productive language for that.
I think that you have a point but I often find myself citing guidelines or rules when I am evaluating code decisions or questioning code design. Maybe it depends on your interpretation of the phrases, some sayings should be followed religiously but others applied with discretion.
The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.
But I guess no one explicitly spells this out, so I could see where someone could become confused.
Premature abstraction. Rule of 3 helps. But I found better principle for it:
Any abstraction MUST be designed to be as close as possible to be language-primitive-like. Language primitives are reliable, predictable, and non-breaking. If they do, they don't affect business logic written on top of it. If parameters are added, defaults are provided. They don't just abstract, they enable developers to express business logic more elegantly.
The challenge is to pick the part of the code abstractable to be primitive-like first and make it a top priority.
This is why language features like Rust async, Go channel-based comm, ES6, C++ smart pointer was such hype in their time and is used up until now. It also applies to enabling tools such as React, tokio, wasm-bindgen, express, TypeScript, jquery (even this which is not a thing anymore).
I find abstractable parts in code by thinking about potential names for it. If there is no good name for a potential common function, it's probably not a good idea to extract the section into a function. Maybe it can be extracted into two separate functions with good names?
The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.
I would say it’s for entire features sometimes. For instance, we are a B2B company. We have features on the roadmap or a feature might be suggested by a client. Either way, you hit the jackpot if you can get one client to pay for a feature that doesn’t exist in your product that you can then sell to other clients.
The problem is that you don’t know whether the feature is generally useful to the market. But you think it might be, in that case, you build the feature in a way that is useful to the paying client and try not to do obvious things that are client specific. But until you have other clients you don’t know.
That's been my experience doing embedded stuff. Customers will tell you they want a feature. You'll implement it then find out they don't need it enough to devote resources to utilize it on their end. So then it just lingers in the code base. And never gets real world testing. Lately I've been pulling unused features and consigning the code to the land of misfit toys. Just so I don't have to maintain them.
If that’s the case and they were willing to fully pay for the feature, at professional services + markup hours, it wasn’t a loss. In our case, we still got unofficially professional services and recurring subscription revenue and now we have a “referenceable client”.
If it’s behind a per client feature flag, it doesn’t cause any harm.
> I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."
I would argue this is a good rule of thumb, but nothing should ever be a hard unbreakable rule. The usual refactoring rules should apply too: if 95% of the code is the same, you should probably go back and refactor even without a third use case because it sounds like the two use cases are really just one use case with slight variation.
If the compiler didn’t catch it, doesn’t that say it was modeled incorrectly?
Why not have an IAirSpace interface or an abstract AirSpace class with two specializations? If there were processes that could handle either it should take an AirSpace class, one that could only handle one or the other took the specialization.
If the steps were the same for handling both, have step1...step(n) defined in the concrete class and have a coordinating “service” that just calls the steps and takes in an IAirSpace.
To be honest I don't get it. What language was this in, that you couldn't duck type the data or make the data structures inherit from some shared interface that encapsulated the commonalities?
> The Rule of 3 is a great rule, except when it isn't.
"Rules are for the guidance of wise men and the obedience of fools."
It's unfortunate that you had to deal with a fool, but that's not a indictment of the particular rule that they picked to follow off a proverbial cliff.
Thanks for your reply. I just want to note that what appears to be a quote in your comment ("Rules are for the guidance of wise men and the obedience of fools") is not something I wrote at all. If you still have time to edit the comment, I might suggest changing that so it doesn't look like you quoted me.
> But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.
Well sure, blind obedience to anything at all can cause problems.
Are you sure merging code for different datafeeds would be better though? In such cases, what is identical and what is not, should be references to eachother in comments. But you don't know beforehand which approach would be better, unless you know the datafeeds will stay the same as now.
The sad story here is that if you know the datafeeds will stay pretty static, there's little to gain making an advanced abstraction over them. Which is why you often find duplicated code that haven't been touched for years.. The original target was met with a naive approach, and no new changes lead to stale codebases.
If you have a 95% match on something nontrivial (and it likely won't diverge significantly), I'd go for merging even with 2 cases. At least merge most of the common parts.
Reading a couple of ifs, and some not-quite duplicate procedures seems much better than having a complete 2-set in cross-refenenced files.
Why are you reading a couple of ifs instead of having the two similar things represented by separate classes with shared functionality in a common class? Or even if you prefer composition to inheritance you could still make it work cleaner without a bunch of if statements.
No, you also need to ask not just whether they're similar, but whether they are the way they are for the same reason, and are thus likely to change together. It doesn't matter if there are 10, if they all might change independently in arbitrary ways.
"DRY" as coined in The Pragmatic Programmer spoke in terms of pieces of knowledge.
If you are combining code just because it looks similar, you're "Huffman coding."
Every time I have seen a feature that was written general to handle possible future uses, after a year of sitting unused there will certainly be modifications or surrounding code that doesn't support the planned extensibility. So it can never be used without a lot of work changing things anyway.
Future coding has lead to some of the most overcomplicated systems I've worked with. It's one of the reasons (among many) I quit my last job. I was constantly told code that had no use cases was "important to have" because "we needed it".
So does that same idea apply to all of the many abstractions thst geeks do just to stay vendor or cloud agnostic just in case one day AWS/Azure go out of business?
On the other hand, I worked on a multi-million line codebase that was deeply joined to oracle’s db, with a team who all really wanted to move away from it but couldn’t because in the beginning (a decade earlier) the choice had been made to not put in “unnecessary” abstractions.
It’s not about the abstractions. In the case of Oracle or any other database, if you’re only using standard SQL and not taking advantage of any Oracle specific features, why are you spending six figures a year using it?
The same can be said about your cloud provider. If you’re just using it for a bunch of VMs and not taking advantage of any of the “proprietary features” what’s the purpose? You’re spending more money than just using a colo on resources and you’re not saving any money on reducing staff or moving faster.
You’re always locked into your infrastructure decisions once you are at any scale. In the case of AWS for instance (only because that’s what I’m familiar with), even if you just used it to host VMs, you still have your network infrastructure (subnets, security groups, nails), user permissions, your hybrid network setup (site to site, client to site VPNs) your data etc.
In either case, it’s going to be a months long project triggering project management, migrations, regression tests, and still you have risks of regressions.
All of the abstractions and “repository patterns” are not going to make your transition effort seamless. Not to mention your company has spent over a decade building competencies in the peculiarities of Oracle that would be different than MySql.
After a decade, no one used a single stored procedure or trigger that would be Oracle specific? Dependencies on your infrastructure always creep in.
Yes. There's no point abstracting over a vendor API if you're not actually using an alternative implementation (even for testing). Otherwise, keep your code simple, and pull out an abstraction if and when you actually have a use case for doing so.
Vendor agnostic code doesn't anticipate AWS going out of business, just them raising prices significantly. It can be smart to be able to switch in a reasonable amount of time so that you can evaluate the market every few years. This way spending extra time to be vendor agnostic can also pay off. But there's no technical reason for that, it's a cost issue.
It’s often noted that in almost 15 years of existence, AWS has never increased prices on any service.
What are the chances that AWS will increase prices enough to make all of the cost in developer time and complexity in “abstracting your code” and the cost in project management, development, regression tests, risks, etc make it worthwhile to migrate?
The cost of one fully allocated developer+ qa+ project manager + the time taken by your network team + your auditors, etc and you’re already at $1 million.
Do you also make sure that you can migrate from all of the other dozen or so dependencies that any large company has - O365? Exchange? Your HR/Payroll/Time tracking system (Workday)? Windows? Sql Server? SalesForce? Your enterprise project management system? Your travel reimbursement system (Concur), your messaging system? Your IDP (Active Directory/Okta)?
DRYing existing code to handle exactly what is needed to support the current uses with no additional axes of variation isn't adding functionality (well, if it handles more than the existing values in the existing axes of variation, it's adding some but arguably de minimis functionality.)
Building code with support for currently-unused variation to support expected future similar-but-not-identical needs, that is, pre-emptively DRYing future code, is adding functionality.
On one project we ended up with a series of feature that fell into groups of threes and we kept trying to make the 2nd feature in the series generic, and time after time #3 was a rewrite and significant rework of #2. So any extra time spent on #2 was time wasted.
I think that what Burlesona is suggesting is more nuanced (and effective) than the rule of three. We can easily imagine situations where code used twice warrants a refactor, and situations where code used three times does not.
The rule of three suffers the same problem as the pre-emptive refactor - it is totally context insensitive. The spirit of the rule is good, but the arbitrary threshold is not.
Similarly, 99% of your comment is bang-on! My only gripe is the numeral 3. But pithy rules tend to become dogma - particularly with junior engineers - so it's best to explain your philosophy of knowing when to abstract in a more in-depth way.
> My only gripe is the numeral 3. But pithy rules tend to become dogma
Agree, 3 is a pretty arbitrary number. If you have a function that needs to work on an array that you know will certainly be of size 2, it takes minimal effort and will probably be worthwhile to make sure it works on lengths greater than 2.
But the bigger point is valid: you need examples to know what to expect, and if you make an abstraction early without valid examples, you'll almost certainly fail to consider something, and also likely consider a number of things that will never occur.
> make it “generic enough to handle future use cases”.
The answer to this is usually YAGNI.
That is, don’t plan for a future you might never have. Code in a way that won’t back you into a corner, but you don’t know what the future’s cases might be (or if there even will be any) so you can’t possibly design in a generic way to handle them. Often you just end up with over-engineered generec-ness that doesn’t actually handle the future cases when they crop up. Better to wait until they do come up to refactor or redesign.
Some people argue to design in a way that lets you rewrite and replace parts of the system easily instead.
It's not so much that you're preparing for 7 separate cases, as a thing that works in one place and almost, but not quite, fits in 6 others. You rarely exactly duplicate code, but often duplicate and modify.
By the time you hit 7, you do clearly Need It. But now you've got 7 cases to work from in writing the generalization. When the number is 2, it's often reasonable to say, "I don't know how these will evolve and I'll probably guess wrong".
Yes, I agree. That’s not what I was replying to, though. I noted in another comment that I consider merging worthwhile even in the absence of three use cases, certainly if what you have now is very similar.
If all seven are the same except for one or two cases, it doesn’t mean that you have to have a bunch of if statements, you either use inheritance or composition to create special cases and judiciously apply “pull members up” and “push members down” refactoring, interfaces, abstract classes, virtual methods, etc. These are all solved problems.
Yes I know about the whole “a square is not a rectangle problem”.
Oliver Steele describes "Instance First Development", which the language he designed, OpenLaszlo, supported through the "Instance Substitution Principle". I've written about it here before, and here are some links and excerpts.
In the right context, prototypes can enable Instance-First Development, which is a very powerful technique that allows you to quickly and iteratively develop working code, while delaying and avoiding abstraction until it's actually needed, when the abstraction requirements are better understood and informed from experience with working code.
That approach results in fewer unnecessary and more useful abstractions, because they follow the contours and requirements of the actual working code, instead of trying to predict and dictate and over-engineer it before it even works.
Instance-First Development works well for user interface programming, because so many buttons and widgets and control panels are one-off specialized objects, each with their own small snippets of special purpose code, methods, constraints, bindings and event handlers, so it's not necessary to make separate (and myriad) trivial classes for each one.
Oliver Steele describes Instance-First Development as supported by OpenLaszlo here:
[...] The mantle of constraint based programming (but not Instance First Development) has been recently taken up by "Reactive Programming" craze (which is great, but would be better with a more homoiconic language that supported Instance First Development and the Instance Substitution Principle, which are different but complementary features with a lot of synergy). The term "Reactive Programming" describes a popular old idea: what spreadsheets had been doing for decades. [...]
Oliver Steele (one of the architects of OpenLaszlo, and a great Lisp programmer) describes how OpenLaszlo supports "instance first development" and "rethinking MVC":
[...] I've used OpenLaszlo a lot, and I will testify that the "instance first" technique that Oliver describes is great fun, works very well, and it's perfect for the kind of exploratory / productizing programming I like to do. (Like tacking against the wind, first exploring by creating instances, then refactoring into reusable building block classes, then exploring further with those...)
OpenLaszlo's declarative syntax, prototype based object system, xml data binding and constraints support that directly and make it easy.
OpenLaszlo's declarative syntax and compiler directly support instance first development (with a prototype based object system) and constraints (built on top of events and delegates -- the compiler parses the constraint expressions and automatically wires up dependences), in a way that is hard to express elegantly in less dynamic, reflective languages. (Of course it was straightforward for Garnet to do with Common Lisp macros!)
>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.
>[...] In defining the semantics of LZX class definitions, I found the following principle useful:
>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.
In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.
That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).
The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.
So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.
In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.
That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).
This gives me nightmares of over engineered xml programming that is infamous I the Java community. You lose all of the benefits of static type checking.
Generic enough code that can be adapted to future cases is code with clean architecture that follows standard OO principles.
On the other hand, trying to handle all the hypothetical cases because "that makes the code generic and future-proof" is usually a complete waste of time.
My view is to develop the simplest, well architected OO code that can handle the use cases at hand.
After witnessing the collateral damage of many software reuse projects (anyone remember "components"?), I came up with a different ruleset, useful for "compromising" with "software architects":
First, translate the data.
Second, divine a common format and share the data.
Third, create the libraries for this common format, to be reused amongst projects.
I have never reached #3 in my professional career. Sure, we wrote the libraries. But other teams, projects have never adopted before whole effort became moot.
So I kept my projects in tact and moving forward, while letting mgmt think they're doing something useful.
> The moral of this story? Don't get trapped by the sunk cost fallacy. If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect. It may have been right to begin with, but that day has passed. Once an abstraction is proved wrong the best strategy is to re-introduce duplication and let it show you what's right. Although it occasionally makes sense to accumulate a few conditionals to gain insight into what's going on, you'll suffer less pain if you abandon the wrong abstraction sooner rather than later.
Sandi Metz's blog (and book) are an absolute gold mine. I'm a junior developer (<5 years), and the material gave me an entirely new outlook on design (object-oriented and otherwise)... and also showed me how un-maintainably most development is done nowadays :(
If there was a required reading list for professional developers, I would put her work on with zero hesitation, I feel it to be that important.
> You call devs Junior until they have over 5 years experience? Wow, that’s harsh
Less than 3-4 years is absolutely junior dev. Up until around 7 or 8 years is mid-level. Passing beyond that would be senior dev and the other titles then follow.
With the obvious note that it's not strictly time bound - someone could easily get stuck at mid level for much longer if they aren't progressing. It's pretty hard to still be only junior after 7+ years
I still very much consider myself junior but someone I know with less experience than me just recently got a job with "senior" in the title. I have seen senior job postings that say something like "3+ years of experience."
I absolutely do not consider myself senior and I don't think I will for at least seven more years.
One of the big reasons I realized I still _was_ a junior developer was because of Sandi Metz's content showing me how long of a journey I still have to go :-P
I don't personally think it's strictly about the time one's been programming, although in my experience that can be a good benchmark for e.g. how good one's abstraction, API design, etc. skills are.
> don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.
This ^^^.
I'm in my mid 50's now and worked as a software dev since my teens. I've learned over time that certain lumps of code need to be left alone for a while before jumping in and aggressively refactoring the perceived "duplication". I've been guilty of this kinda thing before, spot a wodge of code that looks like a duplication, only to find out later you hit some "special cases" as the project progresses and suddenly, as the article points out, your "helper" method balloons into a ball of spaghetti.
As an apropos to the article, and touched upon therein, checking in a fairly major change to de-duplicate some code without consulting the original author/team is a wee bit rude. Ask your colleagues first why such code still exists before barging in and making these changes, they may already have some concerns as to why refactoring to a helper method or some abstraction isn't in their game plan yet. It's a common courtesy.
> As an apropos to the article, and touched upon therein, checking in a fairly major change to de-duplicate some code without consulting the original author/team is a wee bit rude.
This sounds like the outcome of bad culture. Ownership of the code should be shared to the point where it should never be considered rude to improve the code. Any part of the code.
> Ask your colleagues first why such code still exists before barging in and making these changes,
If I had to synchronise with others all of my improvements to existing code (which was frequently written in a hurry to meet a deadline, so with shortcuts taken intentionally, or with incomplete knowledge of future use cases) I would get at most half as much done.
> they may already have some concerns as to why refactoring to a helper method or some abstraction isn't in their game plan yet.
If there are alluring "improvements" that don't work for such subtle reasons, this should be documented in the code. If it's not, one has only oneself to blame when someone goes in and changes it.
Edit: I realise now that I'm talking about teams where everyone is reasonably senior. It could be different with junior members on the team, to which many changes might look like improvements when a senior engineer would at most see the change as equivalent. In that case I think you're right, but for a different reason: junior engineers should always check in with senior engineers about things in order to learn more of the craft!
For 90% of shops outside the cornucopia of SV with unlimited budgets and internal customers only, the client drives decision mercilessly and ruthlessly.
And for a good reason. The product doesn't exist because they're fun to develop. They exist because they solve a problem customers have. So ultimately, decisions should always be based on what customers (long-term) benefit most from.
One of the areas where I really like "incidental duplication" is in tests.
Tests can sometimes be very repetitive and identical, and it's tempting to want to refactor it in some clever way. That's almost never good.
On top of the reasons laid out in parent comment, tests also function as unofficial documentation. I like having everything explicit in there, it makes them easier to read and understand.
If you try to abstract away tests, you often just end up re-implementing the same abstractions used in the actual code, and you can end up not catching unfounded assumptions that your abstraction is making in both the tests and the code. There is a scope for having test helpers / utils to make tests easier to write, but you should be minimalist with these.
I agree, but some test helpers are reasonable. It's about balance and readability of the tests in question. Otherwise integration tests end up being a hundred lines of code.
I'm not sure I agree. I like to move all setup code to helper methods so that my tests are just a few lines
// Given <setup>
// When <thing happens>
// Expect <thing> to be <such>
This allows the reader to easily see which workflows are actually tested. If the reader is interested in implementations the utility code is one click away and usually only needs to be looked at once to be completely understood. The test bodies themselves however have many flavors for many work-flows so getting rid of the repetition is critical to highlight the specific nature of individual tests.
That’s why I make the test fail before writing the code. If the code is already written, then I break it in the minimal way to test the test, and then fix it.
IME things never fail in the way you expect them to... You can build a fortification where you think you are weak only to find the enemy is already in the castle.
A green test does not equal bug free code. There may be a misimplementation / misunderstanding of the spec or your code passes beautifully under right conditions, with just this data setup.
That's a test for your test, so why only run it once transiently instead of running every time? "Mutant" testing helps with this. It's basically fuzzing your test code to make sure that every line is meaningful.
I can see how that would be useful, but I also think it's a matter of priorities.
I'm basically saying I rarely have bugs in my tests because I verify them first. In fact I can't think of a single bug in my tests over the last 4 years (or even 10 years), but I can think of dozens of bugs in my code.
For example here are some pretty exhaustive tests I've written for shell, which have exposed dozens of bugs in bash and other shells (and my own shell Oil):
I would rather spend time using the tests to improve the code than improving the tests themselves. But I don't doubt that technique could be useful for some projects (likely very old and mature ones)
ITYM "I rarely have bugs in my tests that I'm aware of". The amount of tests I've seen that look like they are working properly, have been "verified" and are buggy is huge. Usually we find they were buggy because someone moves some other tests around or changes functionality that should have broken the tests, but didn't.
Please, don't ever assume that your tests are beyond reproach just because you verified them. Tests are software as well and are as prone to bugs as anything.
And how do you do this in practice? I am struggling to think of a good way to keep the production code that fails the test and the production code that doesn't fail the test together. I might have my test check out an old version of the production code, compile it and test against that. But that is hard to get right.
As a preventative measure, I write some tests for my tests. Also in TDD style of course. And on a very rare occasion, I have to write a test for those tests as well.
I do actually do that. I'll write some buggy code in order to learn how to test for it.
TDD for me is primarily a way to guide myself toward accomplishing a goal. So I sometimes write way more tests for myself than the business needs. I will then delete the scaffolding tests before I tag my PR for review.
Tests have fewer bugs if you write them before the system under test, and if they don't have mocks, and if you have enough of them that anything you get wrong the first time will get noticed by the results of the many similar tests.
> Tests have fewer bugs ... if they don't have mocks
100x this. I've repeatedly fail to convince my team members that mocks are unnecessary in most cases. I've reviewed code with mocks for classes like BigDecimals and built-in arrays. This is especially prevalent in Java teams/codebases.
> Tests rarely have bugs, I find, so generally dry isn’t critical
DRY is important for tests, but the areas where it is have probably (if you aren't writing a testing framework or using a language that doesn't have one that covers your use case) largely covered by your testing framework already.
I have 2 rules I use when determining whether to duplicate code or to refactor:
1. How many duplications are there? If the code is duplicated once, that's fine. If it's duplicated twice (so 3 instances of it), then it's time to consider refactoring, subject to the next rule.
2. Why is the code duplicated? If it's "incidental duplication", i.e. code that happens to look the same, don't refactor. Only refactor if there's actually a reason why the code is the same. Which is to say, attempt to predict the future: if a change is made to one instance of the code, do I expect it to be replicated to the other instances too?
It's also useful to look at not just the duplication but the code itself. In this case, it was code for geometry which is not like to change all too much.
Often the difference between harmless but ugly looking duplication and duplication that is actually harmful relies on the semantics of the code and not just its appearance.
It wasn't code for geometry, it was code for manipulating geometric shapes via UI. the article specifically mentions that custom behavior was eventually needed:
> we later needed many special cases and behaviors for different handles on different shapes. My abstraction would have to become several times more convoluted to afford that, whereas with the original “messy” version such changes stayed easy as cake.
It is, which is why you want to be conservative. If the duplicated code is obviously supposed to be identical, then predicting the future should be trivial. If it's not obvious, then it's a question of "can I conceive of a reason why I'd want to update one and not the others?". And if the answer is unclear, wait a while and see if anything comes up.
This. I wait until the feature is mature, and has been used by actual people. There's no point trying to clean up code that hasn't finished evolving, or that I don't understand fully.
It also means I have a tidy stack of refactoring to do when I'm bored or need a quick motivational win :)
And in the other direction, when adding flags etc to a method that was created to reduce duplication, consider if splitting it into two "duplicate" copies or copy it to the call-site requiring the flag isn't the better alternative.
Or, if possible, split up the helper into smaller units of functionality that can be combined as appropriate for different requirements. That could possibly work for the original article's problem, too.
Refactoring duplicated code is a trade off: on the one hand you create abstraction and centralizing code logic at the cost of, first, the overhead of learning that abstraction, and second, by increasing coupling between functions. I've personally found that the coupling is by far the most important factor to consider. If A depends on B, and C is found in both A and B, then you should factor out C. If A and B share C because they are adjacent, without being fundamentally intertwined, then duplicate C, but consider creating a library or module that makes it easy to talk about things that are similar to that duplicated code, C. (I tend to think about modules/libraries as little, independent DSLs)
Another thing you can do when a function becomes overencumbered is to split the remaining similar logic into smaller functions which are composed into specialized functions.
This has benefit in that when analyzing modules, you can spot differences in procedure at a glance instead of needing to dig through 100 lines of somewhat similar imperative code.
Right. There are two kinds of such refactoring / DRY-ing up the code:
1) Specialized helper sub-functions/classes to make the codeblocks DRY.
2) Functions/classes to make separate features DRY.
Problem arises when the design or understanding of the code doesn't reflect the realities of changes over time, forcing you to restart/revert, or making spaghetti code with optionals and whatnot to accomodate the rising complexity.
The agile approach would be to make the code that is easiest to change either way, and prevent being locked in to only one approach.
Go error handling is a good example of this. So far all the attempts to reduce the repetitive `if err != nil { ... }` through some abstraction failed. Look at https://github.com/golang/go/issues/32825
In the language maybe but both Rust and Zig show that it's possible to have much less 'bloat' for error handling even without using exceptions.
I'd say that go designers have still work to do: Zig especially show that you can be a 'simple' language and yet have both sane error handling and generics.
I thought all of this until I got used to Go's error handling.
There's a couple aspects to this:
1. After a while, the "if err != nil {" becomes a single statement in your mind, and you only notice it if it's different (like trapping things that should error with "if err == nil {"). In other words, it only feels verbose if you're not used to it. After a while, the regular rhythm of "statement, error check, statement, error check" becomes the routine pattern of your code and it looks weird if you don't check for errors (which is as it should be).
2. The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system. The only thing the language defines about errors is that they have a method called Error that returns a string. That's it. This means that you can create complex error handlers if you need it, entirely within the standard language. This is extremely powerful.
The Go team's examination of the language error handling is interesting because it seems there's a conflict between newer Gophers who don't like the verbosity of it (but don't realise the power it brings) and the older Gophers who are used to the verbosity and appreciate the power. Almost exactly like TFA. The repitition looks ugly if you don't appreciate the reasons for it.
"It isn't magic!" mantra is often heard in Go apologetics, but every time I see it, it occurs to me that Go's definition of "magic" is somewhat akin to a 15th century peasant seeing a lightbulb. Stuff like exceptions or error types isn't magic - they have been around for a long time, they're well understood, and they have significant advantages.
I kinda prefer this to the Rust apologetics, where every post about Go is replied to with three posts about how Rust does it so much better ;)
I've used lots of other languages, as have a lot of other Gophers. I'm not saying "Go's approach is good" out of some strange tribalism or a need to assert my preference. I'm saying this because, having spent over 35 years programming, I really appreciate the simplicity of Go and the lack of magic. I'm not apologising for Go's simplicity, I'm trying to explain why I like it.
Exceptions are pretty much the perfect example of (bad) magic actually; unannotated, dynamically dispatched nonlocal control flow that can execute code (ie, destructors) in completely unrelated contexts on the way past. At least 0x5F3759DF can be boxed into a function and commented in one place.
> Go's definition of "magic" is somewhat akin to a 15th century peasant seeing a lightbulb.
Except exceptions are rarely understood and used correctly by most programmers. They can simplify program structure, but at the expense of proper errorhandling and error mitigation strategies.
Golang is still in the sort of niche that builds databases, queues, container-orchestration, etc., but can be built for other things given enough care for spending the extra effort simplifying the solutions.
> Except exceptions are rarely understood and used correctly by most programmers
That's pretty condescending.
The mechanism for exceptions has been around for more than 20 years, it is well understood by most programmers.
The problem is that error handling is hard.
Exceptions are an adequately sophisticated solution to that hard problem. Go's approach only encourages ignoring errors (since the compiler never enforces that you handle them) and boiler plate.
Nope! Exceptions are, in fact, rarely understood and used correctly by most programmers. Including loopz. And me. And the authors of approximately every nontrivially-exception-using piece of code I've had to work with. And presumably also of the code loopz has had to work with.
The difference is that some of us have the good sense to rarely use exceptions at all.
Is it really: Are programmers omniscient then that they can trap all kinds of exceptions correctly from external code? It's a sophisticated method that dumps the problem on the user instead.
Golang also output stack traces and even supports panic() if one wants to have something similar to handling exceptions. The difference is that this is used for classes of errors that ideally are programmer error, and not for all kinds of business logic states. I'm not saying the Go Way is perfect either, but it's at least a small step acknowledging the difference, rather than defaulting to dumping random programmer errors on unsuspecting users.
Errorhandling is easier when improving design. The problem is this takes time, thinking and effort.
Go didn't discover anything. Java's runtime and checked exceptions are already the direct consequence of errors being of two kinds: recoverable and non recoverable.
Exceptions are basically error types with dynamic typing in an otherwise statically typed language. So if you can deal with dynamic typing in Python, you can handle exceptions in Java.
The main issue with any discussions on exception is the elephant in the room, Java. Java has a worst model of exception mixing weird typechecking rules + error handling not forcing to recover the exception.
I really like the exception model of Erlang, recovery is only possible from another routine. It's is in my opinion the best exception model.
Go code is nice because everything is fully explicit but it's hard to read, trying to follow the control flow when half of the statements are error recovery code is just very unpleasant. And i still don't know how to express that an error is the result of another error without forgetting the context.
This is kinda my point: it's hard to read until you get used to it. I can totally see why someone used to Python has a hard time with Go's error handling, because they're not used to the rhythm of it (logical statement, error check, logical statement, error check, logical statement, error check); they're more used to (exception handling setup, logical statement, logical statement, logical statement, exception handling completion). It's different, and therefore strange and weird. But after a while you get used to it, and expect to see an error check after each logical statement, and it sort of merges into one structure in your mind.
There are packages for wrapping errors, and I believe some form of error wrapping (using the fmt package for some reason) is being adopted. More than that is up to the coder to implement.
> Java has a worst model of exception mixing weird typechecking rules + error handling not forcing to recover the exception.
Unless you have a specific complaint about Java's model (which I'd love to read), I strongly suspect that your beef is with a few standard Java library functions misusing checked exceptions than a statement against exceptions in general.
The combination of runtime and checked exceptions offers the most complete solution to the difficult problem of handling errors, with the compiler guaranteeing that error paths are always handled.
The big problem with Java's model is that exceptions aren't part of the type system: they're that whole separate thing that is applied to methods, but it's not a part of that method's type in any meaningful sense. This means that it's impossible to write a higher-order function along the lines of "map" or "filter" that would properly propagate exceptions, because there's no way to capture the throws-clause of the predicate and surface it on the HOF itself.
Sounds nice, and of course it is possible to build solutions with exceptions that do recover all errors elegantly and cleanly. However, the correct judge on this would be your own users. Given enough care, the discussion becomes rather philosophical.
Though, having to confront errors through the callstack makes one review where handling would be most prudent, in real-life the time-pressures are just too strong making such efforts largely unrewarded.
> The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system
The error maybe, but not the result of the call. The multiple-value return x, err is not a first-class value. It cannot be handled like any other variable.
This was demonstrated very clearly with proposal for try. try would have automatically returned with err when err != nil. But what if you wanted to change the error, say create an error message? Then try was completely useless. In Rust, where the result actually is just a regular value, you can transform the error however you like just like any other value and try is just as useful as before.
Sorry, I don't understand what you're saying. Are you saying that because there's a proposal in v2 for error values to not be 1st class values, therefore they're not in v1?
I think it’s that in Go multiple return values aren’t a first class value. It’s just two separate values. Whereas in Rust or Haskell they’d be a single, first-class Result<a> (or whatever) value.
> 1. After a while, the "if err != nil {" becomes a single statement in your mind, and you only notice it if it's different (like trapping things that should error with "if err == nil {"). In other words, it only feels verbose if you're not used to it.
The whole point of programming is to abstract away repetitive work. Yes, a human will spot that the pattern is the same, but this is both fallible and a waste of human effort. And even if you can see the difference, those extra characters are still filling up your lines and making it hard to keep functions on a single screen where you can comprehend them more easily.
> 2. The point of Go's error handling is that it isn't magic. There's nothing special about error values, and they are handled exactly the same way as every other variable in the system. The only thing the language defines about errors is that they have a method called Error that returns a string. That's it. This means that you can create complex error handlers if you need it, entirely within the standard language. This is extremely powerful.
There's nothing magic about something like https://fsharpforfunandprofit.com/rop/ either. Just plain old functions and values written in the language (that talk literally gives the definitions of all the types and functions it uses, written in plain old F# code). You need functions as values, but everyone agrees that's a good idea anyway, and you need proper sum types, but you need those anyway if you're ever going to be able to model a domain properly.
> The point of Go's error handling is that it isn't magic.
The problem is first, that sum types are also not magic. There is nothing special about the error type or value in a `Either<T, Err>`. Go's type system is just too crappy to make such things, or make good use of them even after you tried to shove them into an interface{}.
The second problem is that Go's error values, like every error handling system that pretends it doesn't need sum types, have picked up more magic (%w) or impacted the usability of other interfaces (context.Err, separate error channels) bit by bit.
The downside I see with go's error handling is that you can forget to check. With rust, if the function being called returns Result, you have to deal with the error (even if dealing with it just means propagating it out). Missing error handling is such a common source of bugs that go really turns me off here.
Specifically for this issue, linters also have many false positives. Some Go libraries trying to encourage a fluent style will accept an error for some logic also return it, so you can `return x.HandleError(err)` - but if you don't want to return it, you obviously don't care it returns what you just passed it. (I personally consider fluent methods a bad idiom in Go, but I also don't get to write all the Go code in the world or even in my project.)
There are also a lot of functions that return errors because Go's type system demands that if the interface returns two values `T, error`, every implementation must also - it won't auto-create a nil value for the second result. That's reasonable if you are committed to errors just being normal values. But such a restriction would not be necessary if the interface could be declared with a sum type - promotion of a `T` to a `Either<T, ...>` or `Just T` or so on would be fine for all types, not just error handling . Lots of infallible Writer implementations like bytes.Buffer and hash.Hasher suffer from this, and linters can't be aware of all such cases.
Sure, but given a choice, I'd rather work with primitives that are correct-by-construction, not correct-if-I-use-an-extra-tool-and-actually-act-on-its-advice.
If you do use a linter and have it set up so linter issues are fatal to the build, then you run into the issue that if the linter throws false positives, you have to add exclusion rules (if the linter even supports that) or downgrade linter issues back to warnings, and lose the benefit entirely.
One aspect of go error handling that still really bothers me is how easy it is to accidentally drop an error on the floor.
If you strap all the linters you can find onto your build system, you can catch the most obvious cases. But I still frequently find mishandled errors that sneak past the linters in ways that can't be solved without restricting the way devs use the language.
By making error handling something you have to deal with every time you call a function, you massively increase the number of opportunities you have for screwing it up.
I would love something like rusts ? Operator for go. You could choose to not use it when you need special handling. But it would be rare and exciting and developers would use it with care.
I don't really think Go error handling is a good example. A nil test paired with a return cannot be extracted to a function. A macro could perhaps centralize the logic, but otherwise it's no more amenable to deduplication than plain addition. Moreover, it's completely trivial, if verbose in the Go language.
The case where it becomes interesting is when there are four or five statements that are repeated. If it's two statements, especially if the statements are both control flow, one of which is tied to the frame of the executing function, and the other is a trivial nil test, that falls firmly on the "not refactorable" side of the line.
> Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about
In which case, you should split the helper function ( extract sub-part common to all cases, and report the differences where the helper function is called).
I think I would most of the time go with de-duplicating as early as possible, as long as the helper function only has few parameters and can be described in plain english rather easily.
To me the cost of having to refactor the helper function later in the process is less than dealing with duplicated code.
Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read. Every person who reads the code has to make the de-duplication effort mentally (check that the duplicated parts are indeed duplicated, figure out what they do, and on what parameters they differ...).
Premature optimization. Duplicated code is only evil when there's a bug you only fix in one place and forget about the duplicates; in almost every other case, it's easier to reason about and is more resilient in the face of local changes.
Abstraction is often like compression, and compressed data is easier to corrupt. Change the implementation of an abstraction, and you put all consumers of it at risk. It's not an absolute good.
Consider you have 4 times a block of 10 lines of code, they are identical except for a couple of parameters. The person who reads the code has to 1. figure out what the code does 2. see if the duplicated parts differ in some subtle way.
The alternative is to replace the duplicated parts with a function that has a meaningful name. This makes the code easier to read. It's not a premature optimization. It would make sense to keep it duplicated if you're in an explorative phase and not sure yet what the final design will be, but I wouldn't submit a patch where parts are obviously similar. I'm not sure it would pass review.
It really depends. For test setup, I'm inclined to leave alone; duplication is often easier to reason about when a test breaks. For code with control flow changes in the middle, abstraction is honestly dubious. For mere value differences, maybe, but if the values are complex and not merely scalar, maybe not. More duplication is needed to justify when the abstraction needs more edge cases to cover them all, especially control flow more than different values.
I generally find it pretty easy to reason about code structured like:
switch(object)
type1:
(bunch of code)
type2:
(bunch of code)
type3:
(bunch of code)
etc...
Even if the function is long it's pretty easy to skip over the irrelevant parts.
When you get in trouble is when you discover a bug (or have changed requirements) in something that gets duplicated several times and have to remember to hit all of them. The last one especially, it's the one that seems to be missed the most often.
Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.
> Even if the function is long it's pretty easy to skip over the irrelevant parts.
> Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.
Which part is irrelevant? As a programmer, I don't generally know which value `object` has, so if I need to understand the whole statement, I need to look at every case, so I often need to check whether they are identical or slightly different.
Duplicate code like this is a well-known source of bugs, one of the cases most often highlighted by static analysis tools.
It kind of depends what you're doing, but a lot of the time you are in a situation like "there's a bug when you do X", and when you find code like this you scroll through the options until you get to the one labeled as function X.
Inside of there you may run into stuff that was set up earlier in the function, but it's pretty easy to backtrack to the top of the switch statement to see what was done before then.
There's nothing quite like the joy of being asked to fix something and discovering that it's a big old top down project without excessive abstraction/compartmentalization. Just skim through the code in order till you get to the part that's misbehaving and start investigating. No need to jump around a dozen factory template engines or custom generic template wrappers to find the 3 lines of business logic that have an off-by-one error or something.
I am such a huge fan of Big 'Ol Switch Statements over using polymorphism/types/generics/whatever. So easy to understand, and it's all there in a huge scrolling list of cases. If something needs to change, it's easy to change it, and you know what else is affected.
That works until you have 20 different Big Ol' Switch Statements, each switching on (mostly) the same cases, essentially implementing a set of related behaviors in 20 different places instead of grouping the 20 behaviors under the same umbrella.
Overall, I think there is an equilibrium between the number of cases in the switch and the number of different switches with the mostly the same cases. The fewer cases ypu have and the more times you handle the same cases, the more it will help to group these different behaviors in separate places: each case would correspond to a different class, each switch statement to a different method in each class. The fewer classes and the larger they are, the more I think it helps to apply this transformation.
By the way, this has nothing to do with the discussion above. The alternative to the switch that GP commenter presented isn't polymorphism, it is simply extracting the common lines into a separate function.
> Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read
That is I believe contestable. Yes, it can end up easier to read but there is a big tradeoff - when you remove code from its context it is much harder for a reader to reason about it. Once something is extracted into a function you can't see, you have to mentally ask the questions like "could this return null / None?", "how will it behave if the input is negative?", "does it throw exceptions", "is it thread safe?" etc etc. All this is directly observable in context when the code is inline, not so when it is removed to a central place.
Ideally, these questions should be answered by the function name, type, and its documentation. And if not, one can always jump to the function definition and the above elements (missing if the piece of code is inlined) will give additional hints to what the code does.
Upvoted. We'd probably come up with different implementations, I strongly prefer composition and "interpreter" style code, but whenever I've seen casual mutations in different dataflows, it was because of doing one off changes while ignoring the whole.
I think about it in terms of bugs. If your abstraction causes a bug, I have to go in and work out wtf your wonky abstraction is doing and also risk breaking other cases. If there are 6 duplications and there is a bug because one of them is missing a change applied to all of them, that takes 5 mins to fix and risks breaking nothing.
When you make an abstraction think not only "Will this create bugs?" But also "If this abstraction does create bugs will they be easy to identify and fix?".
Extracting common function was right move in your example.
The bad move was to add options to common functio instead of changing one caller to call new different method.
Even worst move was add more and more options to originally simple method. Just because something was rightfully extracted to common place 10 months ago dies not mean it have to remain in common place. And the need for split does not mean original extraction was wrong.
As with everything, it often becomes a big grey area on what is acceptable and what is not.
Example: my (fictional) company sells a B2B platform which provides companies with an online marketplace or some other type of online application. Each installation requires different, though often similar, integrations with the customer back-ends - think Postgres vs MySQL, but some customers may also use TotallyEffingObscureDB, which doesn’t even speak SQL. Those backends are usually the source of truth for the entire stack, storing user data, etc, the local data is just a mirror.
So, given this scenario, how should we approach user registration processes? Is that a product component or custom (not DRY) code? What about all the other (similar, common, and basic) features of our platform that every customer wants but invariably needs to be implemented slightly differently?
I’m only posing these questions because I worked in a situation where something similar happened, and it wasn’t dealt with well, and I’m wondering what other HN coders have to say...
Ironically, we're handling this with the concept called "clean code".
We do have a core, which does implement the base logic. Everything in there is domain driven, but only using DTOs and providing interfaces for input and output, using repositories and presenters.
When the data source changes, we only need to add the new repositories and set those within the context.
If we have to implement some specialized logic just for one client, we add this as a plug-in on top, so it is handled outside of the core logic.
Of course all of this is very abstract and since I've worked mostly with simple MVC concepts until now, I'm still struggling to get my head around this approach, but so far it's looking very well.
It may look overcomplicated at first, but there is a very strict while at the same time very flexible logic behind it, which can handle all the "creative" ideas our clients have so far, while still keep being maintainable.
It is a long process to get to this method of programming, but my initial scepticism has completely changed to "why haven't I worked like this before?".
That’s pretty much the approach I would suggest and think works best. Thanks for your reply.
Looking back, I think the place I worked had a lot of issues with how code was actually stored and maintained - different repositories for each client and each feature, for example, meant a lot of common code was simply copy/pasted when reused and that obviously made sharing bug fixes much more difficult or even impossible. Real dirty stuff. The solution isn’t all that complicated but when you’re on a services team with several hundred clients and hundreds of issues in your backlog, management focuses less on paving the way for the future and more on getting things done immediately. A few of us did implement a single code-bass / configurable framework for one big feature, but it was hard to get buy-in and convince people to use it - even if it reduced the workload from days or weeks to hours. The concepts from that were eventually re-packaged by management and sold by a more creative manager as an “SDK”, but I didn’t have the privilege of working on that team.
Interesting. I'm curious how long did your core system with basic-logic take to reach that maturity, and how many people involved? What kind of development model did you use?
My first job at a bigco had a very heavy read-the-code culture, and being able to read code and write readable code is one of the most valuable skills I learned there. The lack of this skill is one of the things I've found most frustrating about working with less talented or, now that I'm a bit later in my career, more junior engineers. There's a tendency to glom onto black-and-white, superficial rules without understanding them, instead of appreciating the principles underlying them and having a sense of how to balance trade-offs. This creates an unfortunate cycle: everyone writes unreadable code, so nobody practices reading code, so nobody internalizes what readable code looks like and continue to write bad code.
I tend to have a strong reaction to duplicated code, but DRY is risky if whatever you're pulling out isn't logically concise and coherent[1]. Some of the helper functions I've seen in code reviews (as well as the one in the OP) strike me as written by unsophisticated language AIs, catching textual similarities that aren't semantically linked and pulling them out.
The engineers I've mentored over the years, including ones starting with no eng experience, go on to write fantastic code (and no, this isn't just my opinion). But it's a very labor-intensive, hands-on process of thorough reviews and feedback until they internalize the underlying principles, and it can be tough to swing in very short-term-oriented company environments. Now that I'm running larger teams, I've been noodling over how to encapsulate this deeper understanding of what makes good code in a way that scales better. But the fact that Google et al haven't already done this makes me think it's not something you can teach with eg a bullet point list.
> I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.
(Note: this isn't a case of what I describe in the earlier part of my comment, as I don't think this is a superficial, black-and-white rule)
I disagree pretty strongly here, especially since you're #2 involves waiting til you hit a bug. IME, there are many cases in which a solid understanding of the code allows pulling repeated code out in a way that's principled enough that it's more likely to adapt well to future changes. Indeed, this is true of most changelists to some degree, or you'd end up with no functions or classes other than main().
[1] A good rule of thumb is whether the free function has a reasonably readable name that captures its logic, without having to abuse handleFoo() or processBar().
It's important for developers to understand that programming is at least as much about expressing yourself clearly with language, as it is about maths and compsci views of functions. Language that's more verbose, but also adds clarity, is a good thing.
If I had an instruction book on building a cabinet, it wouldn't help to re-list every screw, tool, and their sizes on every single step if I could put a parts list at the front. But it also wouldn't help to collapse every matching group of steps with one or two different parameters together.
Can't be specific without knowing the exact helper function but I find sitting down with a cuppa (away from the computer) and planning an interface (If it's C++) that accepts (say) a policy class with a default and a callable object ("Functor") very helpful.
Sometimes it can't be done but it's better to have a bugfree building block (Macros don't count!) that can accept buggy user defined tasks than lots of repitition.
The best way I’ve had this explained is that refactoring != DRY. Some methods have a similar structure, but if the problem they’re solving isn’t fundamentally similar then it’s not truly refactoring and you’re setting yourself up for more work in the future.
> don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first.
I have a similar rule of thumb: when you do a thing the first time, just get it working. When you do a similar thing a second time, just get it working. When you do a thing a third time, pull it together as a cohesive abstraction.
This is a silly generalization, but the point is that you generally need to see a handful of examples of a thing before you can make any useful abstraction. What's more common is people creating abstractions during the first iteration, leading to features that are never used, and others that were never considered.
And the abstractions are not abstracting over the correct things so they are actually a net negative. One has to reason about the abstraction AND the problem domain.
I find it helpful to work with ADTs of values, lists and maps. No OO, just functions for selection and projection. The majority of programming is figuring out the nuances of the domain and getting something working. Code is actually an impediment to that.
I think that people often make these choices based on latent social and work-related contextual factors as much as there is any misunderstanding or blind adherence to the wrong set of guidelines. There are human factors as well that go into people defensively citing guidelines for their decisions after a criticism comes up in code review. It's somewhat likely that the choice of a premature deduplication wasn't deeply considered. The programmer just did it because they felt like it made sense, and when questioned about it, they claim whatever "rule" exists to justify it.
There's a strong pressure in work environments to get shit done, get it through review with as little fuss as possible, and move on to the next thing. That's work after all. Choosing to leave something in a state that's guaranteed to need attention later can be viewed as lazy. If you at least make an attempt to do something in a clean way, it will be perceived as "work" even if it turns out to be the non-optimal choice.
I find for myself, at least, that I'm far more willing to let things evolve in my personal projects and to discover the right abstractions over time than I am in my work projects. I don't know if that's true for everyone though. My personal projects are always about the learning process. Maybe for a lot of people that's not the case? The focus is more on the product than the process?
I write the first version of every side project in a language I don't know or don't know well and rewrite the actual version I'm going to use in a language I do know well. So I'm really happy to let things evolve, and the more I know the domain in one of my main languages, the more time I spend in the new language because I want to distance myself from what I think I know and examine it fresh when I come back to it and see if what I think I know is actually still (or was ever) valid.
Different people have different teams and motivations both at work and at home, so while I agree with you that taking a wait-and-see approach is often a really good idea, there are often human factors involved in these decisions where the same person will make very different choices depending on their environments and motivations.
> I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change.
... alright buddy, I’m going to need you to remove your spyware on my computer!
I just did this last week with a CBOR message interpreter in C. I did it the long messy way, saw a bunch of repetition, “cleaned” it, changed the way the parser is called, changed some of the validator that runs after it... then did that entire process two more times as requirements changed.
Loved the "incidental duplication" term. I will def start using it from now on.
But to add to this, I want to say that the moment you find your self writing a "helper" function, that's the moment when you must realize that you didn't understand the problem quite good.
As a rule of thumb and what I am always trying to pass to my peers at work etc, is to always try and think in terms of your domain and the models in it.
- What models does this helper function is acting upon or on behalf?
- Does it make more sense to create a model and put that function in it?
- Should that function be in a specific model or more that one?
I am working on a large project and I also realized this. I worked really hard to not repeat myself. I tried to reuse as much as possible as not to create extra code. Then, I quickly realized the things are NOT EXACTLY the same, and the methods to handle things were similar they were not the same. So, I had to refactor a lot of other helper methods.
I agree with the approach you are describing. It would be easier to refactor later, as one would probably have a better understanding of the application in the wild.
I wonder if we could have IDE features that will make #2 less likely. Eg marking some lines of code that they're similar to lines of code elsewhere. And if you then change it in one place the IDE will remind you about the others.
JetBrains has something like this in their tools. It pointed me in the right direction a couple of times when touching code that is similar to something elsewhere.
Relying on your IDE to maintain program semantics is terrible. IDE should help you write good code that you commit, not be a crutch for writing bad code.
People in the thread are arguing that it's not necessarily bad code though. Also, the point of the IDE would be to help you just not forget to change it.
I am coming to the conclusion that code reuse is over pushed as an ideal in universities. Often the desire to create reusable code means that we create overly complicated code with less duplication.
Another approach with statically compiled language and good IDE might be to reduce duplication whenever you see it and inline back when you feel that this abstraction is no longer useful.
From my experience, I would start with a more compact implementation first in hopes that it would save me time. This usually ends with me having to do more complicated operations in my head to plan the code. It's tempting. It's like trying to do math in your head to save time - often leads to a mistake. Now I believe there's no shame in trying out a solution first and rewriting it later when I get a feel what the problem is really about.
John Carmack once said something like "If you're not sure which version is better, code it both ways and compare".
The only downside of later optimization is that in commercial environment they often don't let you clean up.
> The only downside of later optimization is that in commercial environment they often don't let you clean up.
Exactly (speaking for custom industrial automation projects). The follow up next requirements for a project might be in 5 or 7 years or never. You would be cleaning up a 80% dead project. In these environments, it's not uncommon to no longer have a dev environment because the system is a singular testbench that costs upwards of 2 million and only exists in 2 factories, so all bugfixes must be extremely conservative and any refactoring is stricly forbidden because there's no way to locally test for regressions.
Reusing a function like this is not clean code. It violates several principals.
1. Functions should have as few parameters as possible and almost never have flag parameters. This is a basic thing and costs very little to follow. As soon as you want to add a flag to a function you need to make a new function.
2. Minimize coupling.
3. Single responsibility principal. A unit of code should have one reason to change.
Of course in order to follow principles 2 and 3 here you may well need to consider the business logic.
One big way I prevent this from happening is to treat classes as interfaces to data structures and keep everything that isn't about accessing the data elsewhere. Conversions to other data types go somewhere else. In fact I don't want my data types depending on any other data types at all.
When doing this any of this repetition or evolution can stay out of the data structures themselves so that they can be reused without irrelevant baggage.
Have you not simply abandoned OOP at that point? A core point of OOP is that objects manage their own state, and provide an interface for accessing/mutating it.
If classes are only used as data structures, and everything is done through (presumably pure) utility methods, it sounds like you're writing procedural code in an OOP language.
That's not inherently a bad thing, but OOP provides benefits and you may be making a trade-off without thinking about it.
I try to worry much more about what works rather than labels.
If you stuff all your functionality and data transformations into class definitions, your class definitions will be full of dependencies. Now all the fundamental elements of your program depend on each other and can't be separated.
It's like trying to pile up concrete until it forms a cave instead of laying it down to make the floor and building on top of it.
And a rule for Java programmers as well: nobody is going to "extend" your program. If your interface has only one implementation, you do not need the interface at that time, and possibly ever. Nor do you need DI or whatever other masturbation "best practices" Java gurus prescribe. When it comes down to it, Java is a simple, pleasant language. So people invent all sorts of indirection to appear smart. Don't. Just do it simply and readably, with as little indirection and abstraction as possible.
> Firstly, I didn’t talk to the person who wrote it. I rewrote the code and checked it in without their input. Even if it was an improvement (which I don’t believe anymore), this is a terrible way to go about it. A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together.
I totally disagree one million percent! If you are on my team and you want to rewrite code I wrote (cause it sux) then do it! Don't ask, just DO IT! DO IT NOW! Have a blast, tear it apart, rewrite all my shitty abstractions and see if you can do it better. If the result is better code, then awesome! I learn something and the software gets better. If the result is worse code, then awesome! I'll tell you why, you'll learn something and your improved understanding will allow you to write better code.
I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong. It's the company's code. Having a false sense of ownership towards that code will just cause grief for that engineer and friction for his or her team.
> I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong. It's the company's code.
While this is _technically correct_, this isn't how humans work. Humans attach their worth to things they do, even when they shouldn't. It's a difficult thing to avoid to most people so if you write some code, commit it and then later see a teammate completely rewriting it, it can come off as either:
1. You spent a lot of time and hard work on code that you kinda identify with (I mean, it's code you wrote. It's your "art") and then someone comes in and re-writes it without even asking you about it can make it feel that they think you're an idiot.
or maybe even worse
2. They didn't know the reason you wrote it that way (maybe it was in support of future changes?) and now they just screwed it all up.
Maybe you haven't run into either of those situations. If you haven't, great! I spent the first part of my career hitting the first one because of how hard it can be to disassociate with the work you produce and the second one I see happening occasionally. It's a breakdown in communication within a team.
That’s the behavior of a junior engineer though. It’s a sign of an inferiority complex where any change feels like some kind of judgement on your ability to write it “correctly” the first time.
You are not your code, the code you wrote is not representative of you. Nobody who rewrites your code thinks that either.
You’re on a team working to build something larger than yourself. You’re not a bunch of painters sitting with your own easels in a room hoping to tape all of your paintings together in the end.
The best engineering teams I’ve ever worked on were groups of people that had no issues at all modifying each other’s code because we all trusted each other to do the right thing. The worst were the ones where each module had one “owner” who handled all of the changes to their module and “nobody else’s”.
Same here. And my own code is rewritten frequently as business needs change or better factorings are discovered when we hit the rule of 3, as mentioned in this thread.
It's nothing personal; we all get paid and go home. There's a lot of bruised egos in this thread that would be better spent in therapy, I think.
The fix for this is to have a review system in place; you can then catch blatant code repetition before it goes in, and try to use your soft skills to guide the author toward a different solution such that they think it was their idea all along.
If you're the one who has created it and who touches that part of code more often than others then there is a notion of ownership: you don't want to accept changes that will get it into dead end and slow you down. That's how code review emerges. It's the same as the pull request system in the opensource development.
I have a very different view to this. As a disclaimer: I am working in a smaller team where most code is non-trivial, may be it is different in very large teams on large code bases.
This isn't a question of ownership in the sense of a private property. It is about how you interact and respect and professionalism. First of all, a larger change to code a teammate wrote has a ring of saying: it was badly written. So this is partial a matter of social interaction. If the code is badly written, it needs to be fixed, but one should be reasonably polite about that. Usually, the better way of doing this is having a quick chat with that person where you explain why and how you think the code should be changed, before you just do it. Also to pass on any insight you have about the code, the creator missed.
On the other side, there is the big elephant in the room: unless we are talking about rather simple code, the code perhaps was written as it was for a reason. This often enough isn't apparent on the first glance. So a refactoring without checking with the author always bears the risk that you introduce a bug or at least get into the way of planned changes, which would be incompatible with the refactoring.
So there are many reasons to quickly check with the author before doing significant changes. If the proposed changes are good, then the author will be happy about you doing them, if not, then its better to have talked about them before doing them. And in any case, it is good to have communicaton about major changes.
> First of all, a larger change to code a teammate wrote has a ring of saying: it was badly written.
Not necessarily. In actively maintained projects code change all the time in some area, and if you are doing proper commits very often a piece of code does not magically appears in a perfect state but is rather improved in a series.
If you can improve the code you wrote, others can do too.
Code quality is not a binary thing. Moreover, if code quality actually has been improved, you should be happy with it.
Now how it is done is also important, but neither more nor less than any other human interaction.
But that is exactly my point: there are not only a technical but also social considerations. And yes, in the end a code quality improvement - if the refactoring actually is - should make one happy. But still it is the decent thing to check with the author first, if that is possible.
Several years ago, I wrote a big chunk of code to analyze engineering data from an engine test cell, and display a graph of the results. Someone else had done the hard part; I was merely coding up a gloriously-complex Excel spreadsheet in C++. I grabbed data from a MySQL database, and labeled the row data like: row[combustion_air_mass_flow] + row[fuel_mass_flow] * row[specific_gravity_of_diesel]. (Or whatever; it's been awhile. You get the idea.)
I had HUNDREDS of lines of calculations, and each variable was very clearly understood so that you could trace the whole process. The code was running and producing replicated results from the spreadsheet. We started trusting it to process new test runs, instead of copy-pasting into Excel.
The next morning, I came in to find the "other" programmer had stripped EVERY variable reference, and replaced them with the column numbers. There was absolutely NOTHING in the code that could help you understand that "combustion_air_mass_flow" was column, say, 54.
I turned to him and asked him what happened, and he said it was inconsistent with the rest of the code base. I was literally gobsmacked. I racked my brain for a response. In the awkward pauses, he admitted that my code was better, but he couldn't bring himself to use it, because that would mean that he would have to go back and recode every other place that worked like that, and there were many.
He was the guy responsible for most of the system; I was just writing this part because I'm an engineer who codes, and could understand the actual science going on. In the end, I re-replaced my equations with my previous code, and wrote another couple hundred lines defining column number to engineering variable, to "translate" his column numbers to something that made sense.
So, no, I don't believe in "do it; don't ask."
And, if, by some chance, you see this, Chris, I still think that was the weirdest flex I've ever seen.
This reminds me when a developer took over my codebase while I was on holiday. When I returned I had discovered that he converted all tab indents to spaces across the entire project. He completely destroyed my ability to perform diffs against earlier commits, because his preference was evidentially more important. Of course this was all justified with a link to Google’s coding style guide.
The commonality in both your and the parent's situation is that you reacted defensively. Your response was "You destroyed my tabs!" when a proper response would have been to try and understand your coworkers motivation. Perhaps he had some really good arguments for preferring spaces over tabs? Tabs can be problematic in heterogeneous environments where developers with different tab settings and different editors work on the same codebase. Have you read the arguments in Google's coding style guideline?
Maybe your coworker was a fucknut that did it just to mess with your code. I wasn't there so I can't know. My point is that you shouldn't assume that.
Frankly, the justification doesn't matter here: A change of that scope should be discussed and agreed on with teammates before you even put up a patch, not committed while the primary developer is out of town.
If this happened to be Python, there is no such thing. Two pieces of Python code that have different semantics can look identical under a whitespace-excluding diff, so you must not habitually use such a thing as your go-to comparison method.
For instance if we edit:
if condition: if condition:
blah blah
blah -> blah
blah blah
then nothing shows under "diff -b" or "diff -w". With a different kind of language there will be a non-white-space difference due to the changing position of a brace or other delimiting token.
Otherwise, exactly the remark I was thinking of making.
I need to disagree with you, I've been in a few positions where one engineer suddenly decided to rewrite parts of the code base without any input from other engineers. It's a huge blow to team morale, and it gave me a fear of writing code in this team.
Every time I wrote a piece of code, I wondered how long it would be there for, I understand that code evolves, but seeing your code being rewritten after a week is no fun, and it's a huge blow to your confidence as well.
I understand that code needs to be rewritten, because of requirement changes, or different understanding of the original problem, but talk about it, bring it to the attention of the team as to why you feel we need to rewrite. Make sure that the team understands why, and sees the value in it.
Just rewriting code on your own is a big no for me, and to me, breaks the trust that we had.
> Every time I wrote a piece of code, I wondered how long it would be there for, I understand that code evolves, but seeing your code being rewritten after a week is no fun, and it's a huge blow to your confidence as well.
Here's some tough love. If people frequently rewrite code that you write it is because they are stronger developers than you are. You, the junior developer, can either sulk about it and feel miserable or realize what an amazing opportunity you have to improve.
Swallow your pride and approach the person who rewrote your code: "Hey, I noticed that you rewrote the code I wrote last week. Mind telling me what was wrong with it so that I can learn from you?" If the person answers "No" or "I don't have time" that person is an asshole and you are in the wrong place. But if the person answers "Sure! Let's schedule a meeting in a conference room with a whiteboard this afternoon and I'll explain what was wrong with it!" then you are in a great spot!
> Just rewriting code on your own is a big no for me, and to me, breaks the trust that we had.
Not for me. I trust my teammates judgement. They know when it's a good idea to rewrite my code and when it's not. No point wasting time holding meetings and bikeshedding over minutiae over work that can be done in a few hours. 9 times out of 10 they will do the right call. 1 time out of 10 they won't and we revert - no damage done.
I think you’re missing the point as to why I might lose confidence. I’m fine with people being better than me, especially when they explain why they did certain things. But we work as a team, that means that before we do work, we agree on one thing. If suddenly an engineer decides a week later he wants to do something else, then that’s not cool. Even if he has a valid reason for it. Instead you talk to the team, and agree to change.
Again, there’s a reason why we work in teams, and not as individuals. I trust my teammates too, but rewriting code without any discussion breaks that trust for me.
You can discuss when the rewrite is done, that way it wastes less of your time: you can talk about non-hypothetical existing thing. If the discussion shows that the rewrite is bad then it's his problem: he has to fix or revert.
Plus even the initial rewrite is more time-consuming to do if he doesn't discuss it with the original author beforehand.
> Here's some tough love. If people frequently rewrite code that you write it is because they are stronger developers than you are. You, the junior developer, can either sulk about it and feel miserable or realize what an amazing opportunity you have to improve.
Why is this condescending tone and unfounded assumption acceptable here?
It's like, Well, lookie here, a stranger with a generic opinion! What an amazing opportunity for me to assume specifics and talk down to him! You're welcome!
I care more about the way it gets replaced. I constantly learn from my code being replaced, and replacing code myself. However I doubt anyone liking code being pushed onto them. That’s how I feel when someone decides to rewrite code without anyone’s knowledge. They already did the work, so they’re expecting it to be merged in.
I know that programmers are humans but I think paying too much attention to people's feeling is what is dragging the IT industry down, if you have self esteem issues go see a shrink
It’s not about personal feelings or self esteem, it’s about trust within the team, and confidence that when you do something the team sticks to it. When a rewrite is required, the team talks about it and then acts.
> I totally disagree one million percent! If you are on my team and you want to rewrite code I wrote (cause it sux) then do it! Don't ask, just DO IT! DO IT NOW! Have a blast, tear it apart, rewrite all my shitty abstractions and see if you can do it better. If the result is better code, then awesome! I learn something and the software gets better. If the result is worse code, then awesome! I'll tell you why, you'll learn something and your improved understanding will allow you to write better code.
I see your point of view.
There is a need to detach from your code while still being passionate in programming.
The key is the mindset: programming so that people enjoy what you're programming and not just for the sake of programming.
Also, to note, the refactor presented should objectively serve the same cause/requirement better than current code.
I've been a lead myself here and what danabramov nailed is that he realized this:
> My code traded the ability to change requirements for reduced duplication, and it was not a good trade.
I think the key to most of this situation is right there. It's also something the original author largely misses.
The key is this: want.
If you want to rewrite some code you find, then kindly leave that keyboard and go play with some toys. This, "wanting", is the actual origin of the author's anecdote. "I saw this and I thought it was bad and I felt the urge to rewrite it". Ok, but don't.
What is missing is knowledge on the circumstances of the code. You see some piece of code and you're only seeing that, the code itself. But why is it that way? Is this code supposed to be modified/extended/reduced/deleted in a near future? Frequently? Or simply, as a first question: Does this code need to be better?
And even if the answer is yes, then in what aspects does it need to be better? What does "better" mean for this piece of code? Did you want to just make it cooler looking because you didn't like it? Or have you actually detected that some of the needs this code was supposed to fulfil aren't being fulfilled? Of course, if you only now saw the code and do not know what those needs are then you're in no position to rewrite it, just because you want.
But then again, if there is indeed motive to rewrite it, by all means, do.
Following this idea will probably lead to having to ask, sure. But you won't be asking for permission, you'll ask because you need information before you can decide.
> What is missing is knowledge on the circumstances of the code. You see some piece of code and you're only seeing that, the code itself. But why is it that way? Is this code supposed to be modified/extended/reduced/deleted in a near future? Frequently? Or simply, as a first question: Does this code need to be better?
That's exactly why you should always be very careful refactoring old code that you don't fully understand. It might look like spaghetti code that contains a lot of weird artifacts, when in reality it once was clean code that had to be edited dozens of times to handle edge cases. Not that it can't be refactored, just that it's likely not to look that much cleaner if it's to provide the same functionality. Refactoring old cold just because it looks ugly is often a waste of time and money.
This is fine in some environments, but not all. I worked on a code base once that had over 150 engineers. Teams were broken up mainly along feature and UX lines, but there were a few cases where some features were used in multiple user experiences. I worked on one of those features.
There was an engineer on one of the user experience teams that didn't understand this and decided to rewrite a portion of our code to make it more performant in their UX. In doing so, this engineer introduced bugs into our feature that adversely affected other user experiences within the application, but was not apparent in their own implementation. Had our team known about this, we would have very easily 1. been able to point out the bugs 2. helped the engineer with a better solution.
This wasn't a small effort, the engineer's team received a requirement from their PM, groomed the ticket, architected the plan, assigned the ticket to the engineer to work, the engineer wrote the code, then had at least two reviewers on their team approve the change before merging it. So we're talking multiple points of failure, I don't blame the engineer individually.
The fallout came a week after the code was deployed when the UX team flipped on the feature toggle. They checked their UX, it looked good, and continued on. Meanwhile other UX teams started seeing crashes coming from our feature. (the crash didn't manifest in testing because it was toggled off, which was a failure on the UX team)
This wrecked collaboration and trust across our codebase in multiple ways and lead to increased overhead processes. We dealt with the initial fallout from UX teams not trusting our feature, which then evolved in to UX teams not trusting each other, UX teams not trusting feature teams, feature teams not trusting anyone, an no one trusting the procedures in place.
Yes there were multiple failures. But at the end of the day, this whole scenario could have been prevented with a one line message: "Why did you implement this in this way?" No one was emotionally invested in the code that was changed, it was for all intents and purposes crap code, but the crap code worked, the new code didn't, a simple courtesy check would have saved a ton of time, money, and trust.
If you are on my team and you want to rewrite code I wrote (cause it sux) then do it!
Sure, rewriting is fine, but ideally it would come in the form of a PR rather than being checked in, so if you had valid reasons for writing it that way, it can be reverted before anyone else layers code on top of the bad rewrite.
Redoing work you just did is tantamount to critisicm. I agree that everyone should welcome constructive criticism, but some tact is necessary in applying it.
> I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong. It's the company's code. Having a false sense of ownership towards that code will just cause grief for that engineer and friction for his or her team.
I think some places value a sense of ownership because that person then becomes responsible for maintaining that code and reviewing changes. There is someone accountable for it. Not that they have any special rights to that code.
Please don't consider refactoring or rewriting equivalent to redoing. Refactoring or rewriting means that I'm adding my name to the code, but I'm NOT removing yours. You still did the hard work of coming up with the original functionality even if it is later changed. Refactoring does emphatically not mean that your original work had no value.
Like if someone were to change all my spelling and grammatical errors in my top comment. I wouldn't mind at all. Hell, someone could even rephrase the comment entirely and remove the weak points and emphasize the strong ones as long as the main message is the same. I'm not a native English speaker so I'd probably learn from the experience.
> I also think that a salaried engineer who thinks that a piece of code he or she (but almost always he) wrote is "his" or "hers" is totally wrong
This. If some developer get offended when there is some issue in their code or the way it got implemented either they are not mature enough or there is a cultural issue in the team.
The point is, code "ownership" shouldn't be understood as a term of property - of course it isn't the property of the programmer but the company - but in terms of responsibility.
This might be different in very large teams, but usually, once you write a piece of code, you are the the prime responsible person for maintaining it. And as long as this is the case, I would expect to be involved in any significant change to the code. Of course I am fine with changes, which also transfer the responsibility :)
What projects is everyone working on that they have the time to keep rewriting each other's code all the time? :)
If such teams did code review, maybe they'd get it right on the first commit and would only have to refactor when the requirements change or there's a clear benefit.
That sense of "ownership" in some small piece of the product? Helps build loyalty to the project, and the company. Seeing my baby with that brand name on it helps me feel a part of the team. I've left many a project that didnt offer me the loyalty of letting me fix my own mistakes (after some guidance).
Toe-stepping aside... How else are you going to develop this apparently inefficient coder without including them into the process of reforming this code? This isnt just about ego, this is about skill. This is about the code Im gonna write in the future. And here's your opportunity, written in our codebase.
There are a lot of situations where 3-5 lines of many methods follow basically the same pattern, and it can be aggravating to look at. “Don’t repeat yourself!” Right?
So you try to extract that boilerplate into a method, and it’s fine until the very next change. Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about, because it’s actually handling a dozen cases that are superficially similar but full of important differences in the details.
I encourage my devs to follow a rule of thumb: don’t extract repetitive code right away, try and build the feature you’re working on with the duplication in place first. Let the code go through a few evolutions and waves of change. Then one of two things are likely to happen:
(1) you find that the code doesn’t look so repetitive anymore,
or, (2) you hit a bug where you needed to make the same change to the boilerplate in six places and you missed one.
In scenario 1, you can sigh and say “yeah it turned out to be incidental duplication, it’s not bothering me anymore.” In scenario 2, it’s probably time for a careful refactoring to pull out the bits that have proven to be identical (and, importantly, must be identical across all of the instances of the code).